HoneyPLC: A Next-Generation Honeypot for Industrial Control Systems Efrén López Morales Arizona State University edlopezm@asu.edu Carlos Rubio-Medrano Texas A&M University - Corpus Christi carlos.rubiomedrano@tamucc.edu Adam Doupé Arizona State University doupe@asu.edu Yan Shoshitaishvili Arizona State University yans@asu.edu Ruoyu Wang Arizona State University fishw@asu.edu Tiffany Bao Arizona State University tbao@asu.edu Gail-Joon Ahn Arizona State University Samsung Research gahn@asu.edu ABSTRACT Industrial Control Systems (ICS) provide management and control capabilities for mission-critical utilities such as the nuclear, power, water, and transportation grids. Within ICS, Programmable Logic Controllers (PLCs) play a key role as they serve as a convenient bridge between the cyber and the physical worlds, e.g., controlling centrifuge machines in nuclear power plants. The critical roles that ICS and PLCs play have made them the target of sophisticated cyberattacks that are designed to disrupt their operation, which creates both social unrest and financial losses. In this context, honeypots have been shown to be highly valuable tools for collecting real data, e.g., malware payload, to better understand the many different methods and strategies that attackers use. However, existing state-of-the-art honeypots for PLCs lack sophisticated service simulations that are required to obtain valuable data. Worse, they cannot adapt while ICS malware keeps evolving, and attack patterns become more sophisticated. To overcome these shortcomings, we present HoneyPLC, a high-interaction, extensible, and malware-collecting honeypot supporting a broad spectrum of PLCs models and vendors. Results from our experiments show that HoneyPLC exhibits a high level of camouflaging: it is identified as real devices by multiple widely used reconnaissance tools, including Nmap, Shodan's Honeyscore, the Siemens Step7 Manager, PLCinject, and PLCScan, with a high level of confidence. We deployed HoneyPLC on Amazon AWS and recorded a large amount of interesting interactions over the Internet, showing not only that attackers are in fact targeting ICS systems, but also that Honey-PLC can effectively engage and deceive them while collecting data samples for future analysis. ACM Reference Format: Efrén López Morales, Carlos Rubio-Medrano, Adam Doupé, Yan Shoshitaishvili, Ruoyu Wang, Tiffany Bao, and Gail-Joon Ahn. 2020. HoneyPLC: A Next-Generation Honeypot for Industrial Control Systems. In Proceedings of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. CCS '20, November 9–13, 2020, Virtual Event, USA © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7089-9/20/11. . . $15.00 https://doi.org/10.1145/3372297.3423356 the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS '20), November 9–13, 2020, Virtual Event, USA. ACM, New York, NY, USA, 13 pages.https://doi.org/10.1145/3372297.3423356 1 INTRODUCTION Industrial Control Systems (ICS) are widely used by many industries including public utilities such as the power grid, water, and telecommunications [37]. These utilities are integral to people's daily life, and any interruption to them may cause significant damage and losses. The increasingly interconnected nature of modern ICS makes them more vulnerable than ever to cyberattacks. For example, a cyberattack that targets a power grid would potentially lead to blackouts in a city or across an entire geographical region. Regrettably, this proposition is no longer a fiction. The number of attacks targeting ICS has been steadily increasing since the infamous Stuxnet malware first showed the world that ICS networks are not secure [13]. Also, in 2015, a cyberattack targeting the Ukrainian power grid successfully took down several of its distribution stations. The ensuing outages left approximately 225,000 people without access to electricity for several hours [11]. One of the key components of ICS networks are Programmable Logic Controllers, better known as PLCs [37]. They control missioncritical electrical hardware such as pumps or centrifuges, effectively serving as a bridge between the cyber and the physical worlds. Because of their critical role, PLCs have been recently targeted by cyberattacks, which attempt to disrupt their proper functioning in an effort to affect their corresponding ICS as a whole. As an example, PLCs were the primary target of the Stuxnet malware as they controlled critical physical processes in a nuclear facility. To better understand cyberattacks against ICS and PLCs, several honeypots have been proposed [3, 9, 14, 15, 21, 40]. However, current honeypot implementations for ICS fail to provide the necessary features to capture data for most recent and sophisticated attack techniques. For example, a common limitation exhibited by most of the existing approaches in the literature is their low-interaction nature: these approaches usually rely on basic and shallow simulations of network protocols, which usually lack complex functionality that limit the attack vectors and makes them easy to discover by attackers. These shortcomings heavily restrict the value of the attack data that can be gathered by these ICS honeypots. Providing a solution to these issues comes with a set of unique challenges. First, it is difficult to achieve meaningful, step-by-step
protocol simulation that can eventually result in high-level, deceiving interactions between honeypots and attackers. These inadequate simulations complicate concealing the true nature of honeypots up to the point accurate and valuable data, e.g., the actual malicious ladder logic code itself can be retrieved from attackers for further analysis. Second, several network protocols largely used in ICS, e.g., S7comm [40], are proprietary, in the sense that no detailed documentation on them is publicly available, which prevents an effective understanding of the protocol, including hidden configuration parameters as well as implicit, undocumented assumptions, which can ultimately reveal the true nature of a honeypot to an attacker. Moreover, existing PLCs used in practice vary in terms of configuration settings, supported protocols, and the way they are customized for different application domains. Creating a general framework that can effectively support such heterogeneity of PLCs devices, regardless of their manufacturing brand and model, without requiring the edition of large and clumsy configuration files, represents a non-trivial challenge. To alleviate the aforementioned concerns targeting ICS worldwide, and effectively tackle the research challenges just discussed, this paper presents HoneyPLC: a high-interaction, extensible, and malware-collecting honeypot modeling PLCs, which is specifically crafted for ICS. HoneyPLC includes advanced simulations of the most common network protocols found in PLCs, namely, the TCP/IP Stack, S7comm, HTTP, and SNMP, addressing the challenges introduced by inadequate simulations and protocol closeness as discussed before. As an example, our TCP/IP Stack simulation benefits from the introduction of a novel technique called fingerprint reversing, which allows for accurately modeling TCP, ICMP, and UDP probes at runtime, providing an effective, customized response to each interaction as initiated by an attacker, largely increasing the level of engagement and subsequent deception. In addition, our simulation of the S7comm protocol, which is core to PLC communications, provides a level of simulation that is able to trick even proprietary tools such as the Siemens Step7 Manager [8]. Moreover, HoneyPLC also provides enhanced extensibility features, allowing for PLCs of different models and manufacturing brands to be effectively simulated. Thus addressing the PLC heterogeneity challenge just discussed. We have successfully tested this feature using five real PLCs, allowing for HoneyPLC to currently support out-of-thebox the Siemens S7-300, S7-1200, and S7-1500; the Allen-Bradley MicroLogix 1100, and the ABB PM554-TP-ETH PLCs. HoneyPLC also implements an advanced simulation of the internal memory blocks featured by modern PLCs, allowing for the automated capture and storage of malicious ladder logic programs, which can be later analyzed to reveal new attacking techniques. The features just discussed are, to the best of our knowledge, exclusive to HoneyPLC, and also significantly advance the state-ofthe-art for ICS honeypots. This positions HoneyPLC as a convenient and flexible tool that can serve as a reliable basis for the analysis and understanding of emerging threats and attacks, as well as the subsequent development of protection techniques for ICS. This paper makes the following contributions: (1) We provide a summary of the limitations and shortcomings of existing ICS Honeypots and discuss how they address (or not) emerging malware threats, as well as new ICS technology, e.g., new PLC models and ICS network protocols. (2) We present HoneyPLC, a high-interaction honeypot for PLCs, which not only solves many of the limitations of related approaches, but also provides convenient support for further understanding and eventually defeating emerging threats for ICS. (3) We introduce the HoneyPLC PLC Profiler Tool, which allows for the effective simulation of many different PLCs regardless of their model and manufacturer. (4) Finally, we provide experimental evidence showing that HoneyPLC is not only effective at engaging and deceiving state-ofthe-art tools for network reconnaissance, but also outperforms existing honeypots in the literature, achieving a performance level comparable to real PLC devices. In an effort to further open and produce reproducible science, HoneyPLC and all our experimental results are available online 1 . 2 BACKGROUND AND RELATED WORK In this section, we state the background of PLCs (Sec. 2.1), network reconnaissance tools Sec. ( 2.2), ICS malware (Sec. 2.3), and ICS honeypots (Sec. 2.4). 2.1 Programmable Logic Controllers A Programmable Logic Controller (PLC) is a small industrial computer designed to perform logic functions based on input provided by electrical hardware such as pumps, relays, mechanical timers, switches, etc. PLCs have the capability of controlling complex industrial processes, making them ubiquitous in ICS and SCADA environments [36]. Some popular PLC manufacturers include Siemens [34], Allen-Bradley [6], and ABB [5]. Internally, PLCs have programmable memory blocks that store instructions to implement different functions, for example, input and output control, counting, logic gates, and arithmetic calculations. 2.2 Network Reconnaissance Tools Nmap. Nmap or "Network Mapper" [22], is a popular open source utility that is able to detect the operating system and services that a particular device is running by sending raw IP packets over the network. Once a given detection scan is completed, Nmap can either report a single OS match or a list of potential OS guesses, each guess with its own confidence percentage rate, in the range 0 to 100, where 0 denotes the complete absence of confidence, and 100 denotes a complete confidence on the projected guess result. PLCScan. PLCScan [33] is a reconnaissance tool used to scan PLC devices in a given network. PLCScan reveals PLCs that implement the S7comm protocol over TCP port 102 or the Modbus protocol over TCP port 502. It is written as a command line Python script and lists PLC information including basic hardware, serial number, name of the PLC, and firmware version. Shodan. Shodan is a search engine and crawler [23] specifically tailored for devices exposed across the Internet, e.g., webcams, routers, ICS devices, among others. The Shodan Honeyscore (part of the Shodan API [23])is a tool that checks whether a device is a honeypot or not. Given an IP address, the Shodan Honeyscore 1https://github.com/sefcom/honeyplc.
Table 1: Comparison of Existing PLC Honeypots in the Literature and HoneyPLC. Keys: × = No Coverage; 1/2 = Limited Coverage; = Optimal Coverage. Approach/ Feature Extensibility TCP/IP Stack Simulation Out-of-the-Box PLCs ICS Network Services Ladder Logic Capture Physics Interaction Logging Gaspot [39] × × 1/2 × × 1/2 SCADA HoneyNet [3] × 1/2 × × Conpot [15] 1/2 × 1/2 × × Digital Bond's Honeynet [38] × × 1/2 × × DiPot [10] 1/2 × 1/2 × × SHaPe [19] 1/2 × 1/2 1/2 × × CryPLH [9] × 1/2 1/2 × × × S7commTrace [40] 1/2 × 1/2 1/2 × × Antonioli et al. [7] 1/2 1/2 1/2 × 1/2 HoneyPhy [21] 1/2 × 1/2 1/2 × 1/2 × HoneyPLC × Paper Sections Addressing Feature 4.2, 5.2 4.3, 5.4, 5.5 4.2, 5.2 4.3, 5.7 4.4, 5.8 6 4.5 calculates the probability that the host is a honeypot, in a range between 0.0 and 1.0, where 0.0 means that the host is definitively a real system, and 1.0 means the host is definitively a honeypot. According to Shodan's creator, the following criteria is used for calculating Honeyscores [24]: (1) too many open network ports; (2) a service not matching the environment, for example, an ICS device running on AWS EC2; (3) known default settings of known honeypots; (4) if a host was initially classified as a honeypot, then it is highly likely that it remains a honeypot today, even though its configuration may look real; (5) a Machine Learning classification algorithm (not disclosed); and, finally, (6) the same configuration being used across multiple honeypots. 2.3 Exemplary ICS Malware Stuxnet. The first ever-documented cyber-warfare weapon, Stuxnet, was a turning point in the history of Cybersecurity [12], targeting PLC models 315 and 417 made by Siemens to modify their inner ladder logic code while concealing itself from ICS administrators [20]. The malware would first spread itself via USB sticks and the local network, looking for vulnerable Windows workstations. Later, it would proceed to infect the Step7 and WinCC Siemens proprietary software by hijacking a Dynamic-Link Library (DLL) file used to communicate with the PLCs. Finally, the malicious ladder logic payload would be dropped only on the aforementioned models based on specific manufacturer numbers and memory blocks. Kemuri Incident. In 2016, attackers were able to gain access to the network of an undisclosed water supply company identified by the alias Kemuri [13], targeting hundreds of ICS devices such as valves, back-office documents, and PLCs that managed the chemical processes for water treatment. The attackers successfully altered the amount of chemical and disrupted the water supply, causing considerable damage and putting human lives at risk. Crashoverride. Otherwise known as Industroyer [35], CRASH-OVERRIDE is a sophisticated malware designed to disrupt ICS networks used in electrical substations. It shows in-depth knowledge of ICS protocols used in the electrical industry that would only be possible with access to specialized industrial equipment. CRASHOVERRIDE dealt physical damage by opening circuit breakers and keeping them open even if the grid operators tried to close them back to restore the system. It is believed to have been the cause of the power outage in Ukraine in December of 2016 [13]. 2.4 Honeypots for ICS Honeypots are computer systems that purposefully expose a set of vulnerabilities and services that can be probed, analyzed and ultimately exploited by an attacker [28], allowing for all possible interaction data to be monitored, logged and stored for future analysis. A summary of existing ICS honeypots is shown in Table 1. Low-Interaction Honeypots. Low-interaction honeypots offer the least amount of functionality to an attacker [25, 28]. The services exposed by this kind of honeypot are usually implemented using simple scripts and finite state machines. Because of their limited interaction, attackers may not be able to complete their attack steps or may even realize that their target is a fake system. On the other hand, low-interaction honeypots cannot be fully compromised as they are not real systems, which greatly reduces maintenance costs and time invested in configuration and deployment. Gaspot [39] is a low-interaction honeypot written as a Python script that simulates a gas tank gauge. It can be modified to change temperature, tank name, and volume. The SCADA HoneyNet Project was the first honeypot implementation specifically built for ICS [3, 38]. This
project was aimed at developing a software framework capable of simulating ICS devices like PLCs using Python scripts. Conpot [15] is also a low-interaction ICS honeypot implementation that simulates a Siemens S7-200 PLC, and can be manually modified to simulate other PLCs by editing an XML file. High-Interaction Honeypots. High-interaction honeypots lie on the other side of the spectrum, as they strive to offer the same level of interaction as a real system [25]. CryPLH is a high-interaction honeypot that simulates a S7-300 Siemens PLC [9], and includes HTTP, HTTPS, S7comm and SNMP services running on a Linux host that has been modified to accept connections on specific ports. The S7comm protocol is simulated by showing an incorrect password response and the TCP/IP Stack is simulated via the Linux kernel. S7commTrace [40] provides a high-interaction simulation of the S7comm protocol and supports the Siemens S7-300 PLC. Antonioli et al. [7] proposed a high-interaction honeypot that leverages the MiniCPS framework to simulate the Ethernet/IP protocol and a generic PLC. HoneyPhy [21] provides a novel physics-aware model to simulate a generic analog thermostat and the DNP3 protocol. 3 LIMITATIONS OF CURRENT HONEYPOTS Despite the benefits of honeypots previously discussed, current honeypots for ICS in the literature, shown in Table 1, fail to provide the necessary features to capture data on the latest and most sophisticated attacks, exhibiting the following limitations: L-1 Limited Extensibility. A common limitation in the current literature is the narrow extensibility support for the many different PLC devices and network services that are used in ICS in practice and have already been targeted by recent attacks. As an example, Stuxnet and the Kemuri attack targeted different kinds of PLCs, whereas CRASHOVERRIDE targeted different network services, as it was discussed in Sec. 2. Following Table 1, several approaches in the literature provide limited extensibility capabilities, which mostly include the manual edition of XML files to support additional PLCs. This process, besides being tedious and time-consuming, may be highly error-prone, and may ultimately reveal the true nature of a honeypot to attackers if implemented incorrectly. This is aggravated by the fact most of the approaches in the literature support only one or two PLC models out-of-the-box, as it is shown in Table 3. As we will discuss in Sec. 4.2, the HoneyPLC Profiler Tool can be customized to support PLCs of different brands and models. As an example, HoneyPLC currenlty provides out-of-the-box support for 5 PLCs of three major brands, as detailed in Sec. 5.2. L-2 Limited Interaction. Current approaches mostly provide limited functionality when it comes to TCP/IP Stack simulations, as well as native ICS network protocols, as described in Sec. 2. This is a serious limitation that stops current approaches from extracting value from adversarial interactions and malware. As an example, CRASHOVERRIDE, leveraged advanced ICS protocol features that are not supported by low-interaction honeypots. This would ultimately result in the loss of highly valuable data. Even high-interaction honeypots fail to provide advanced enough protocol simulations. For example, CryPLH [9] implements the S7comm protocol using a Python script that only simulates an incorrect password screen. HoneyPLC solves this limitation by provided extended support for various networks protocols implemented by means of its dedicated PLC Profiles, as we will discuss in Sec. 4.3, and evaluate through experiments in Sec. 5.4–5.7. L-3 Limited Covert Operation. The moment an attacker discovers the true nature of a honeypot, it is game over, as the attacker might stop interacting with it altogether and stop revealing her attack methods. Therefore, honeypots should aim to fool widely used network reconnaissance tools, e.g., Nmap, introduced in Sec. 2.2, to maintain their covert operation. In such regard, the SCADA HoneyNet Project [3] is the only approach in the literature that provides a convincing deception to attackers. Also, Linux Kernel simulations, implemented by several approaches in the literature, e.g., CryPLH, fail to deceive Nmap. Other work fails to attempt or even mention such a crucial feature. To overcome this limitation, HoneyPLC provides advanced network simulations intended to deceive reconnaissance tools such as Nmap, as we demonstrate in Sec. 4.3. L-4 No Malware Collection. The highly specialized nature of ICS devices calls for better analysis, dissection, and understanding techniques specifically tailored for emerging malware trends. In such regard, honeypots are a great tool to collect and analyze malware [29]. However, as shown in Table 1, there exist no honeypots for ICS in the literature that can provide such functionality, which has serious consequences for the security of ICS in the presence of recent malware-injection attacks. For instance, Stuxnet injected malicious ladder logic code to the targeted PLCs. To solve this, HoneyPLC provides a novel feature to capture ladder logic, as described in Sec. 4.4 and Sec. 5.8. 4 HONEYPLC: A CONVENIENT HIGH-INTERACTION HONEYPOT FOR PLCS Having described the limitations of existing approaches, we now present HoneyPLC, an extensible, high-interaction, and malwarecollecting honeypot for ICS. HoneyPLC provides advanced protocol simulations, e.g., TCP/IP, S7comm, HTTP, and SNMP, achieving an interaction level comparable to real PLCs, ultimately introducing low-to-moderate levels of risk as well as low maintenance costs. We start by providing an illustrative use case scenario, which exemplifies how the different inner modules and components of HoneyPLC interact with an attacker at runtime when an attempt to compromise a PLC is made. Later, we elaborate on how HoneyPLC solves each of the limitations highlighted in Sec. 3. 4.1 Illustrative Use Case Scenario Initial Setup. As it will be further discussed in Sec. 4.2, HoneyPLC can be extended to simulate PLCs of different models, communication protocols, and/or manufacturer brands. With that in mind, the very first step when using HoneyPLC includes choosing the PLC Profile featuring the desired real-life PLC that will be exposed to attackers as a honeypot. This process is shown in Fig. 1 (Step 1). PLC Profiles can be chosen from a dedicated repository included as
a part of HoneyPLC. For the rest of this case scenario, let us assume the S7-1200 model is selected. Fingerprinting. Once HoneyPLC is deployed, an attacker may try to fingerprint it using a reconnaissance tool such as Nmap or PLCScan (Fig. 1 (Step 2)). When initial contact is established, all the TCP/IP requests will be handled by the HoneyPLC's Personality Engine, which in turn, is based on features provided by the Honeyd [2] tool, as it will be further discussed in Sec. 4.3. (Fig. 1 (Step 3)). Since the S7-1200 PLC model was selected in the beginning, the Personality Engine will use the appropriate fingerprint contained within the PLC Profile to reply to communications started by Nmap. At this point, Nmap may confirm to the attacker that she is dealing with a PLC and not a honeypot, as we show in Sec. 5. Reconnaissance. In a subsequent step, an attacker might try to initiate an S7comm connection to check what PLC memory blocks are available. As mentioned in Sec. 2, such a process is crucial when attempting to modify the inner ladder logic code of a PLC. The connection is first handled by the HoneyPLC's Network Services module, and later forwarded to a dedicated S7comm server. (Fig. 1 (Step 4)). The S7comm server then replies with the requested information and the Integration Framework forwards the replies to the attacker. In the meantime, the S7comm server is logging all the interactions, including the attacker's source IP address and memory block requests made to the PLC. Code Injection. At this point, when the attacker identifies a PLC memory block suitable for injection, he/she uses an S7comm application like PLCinject [4] to load ladder logic code into the PLC, effectively overwriting any pre-existing code and introducing a custom-made malicious payload. (Fig. 1 (Step 5)). As a result, the HoneyPLC's S7comm server will write the code into the dedicated HoneyPLC repository, which is managed by the Interaction Data module. (Fig. 1 (Steps 6 and 7)). After this case scenario has been completed, HoneyPLC may have been able to collect crucial information about the attack inside its logging infrastructure: (1) the public IP address of the attacker, (2) the specific PLC memory blocks the attacker was targeting, and best of all, the critical piece, (3) the ladder logic program he/she has injected. Later on, such a malware sample can be analyzed at the byte level to get a better understanding of the malicious instructions that the attacker wanted the PLC to execute. In Sec. 6, we elaborate on this idea as a part of our future work. 4.2 Supporting PLC Extensibility PLC Profiles. The PLC Profile Repository, shown in Fig. 1 (Step 1), is a collection of PLC Profiles that hold all the required data to simulate a given PLC. It communicates with the Integration Framework and Network Services modules to customize the PLC that HoneyPLC is simulating at any given time and addresses the lack of extensibility discussed in Limitation L-1. In turn, a PLC Profile is a collection of three discrete datasets, which allow HoneyPLC to simulate a particular PLC device by means of highly customized simulations of network interactions, as it will be discussed in Sec. 4.3. • SNMP MIB. A Management Information Base (MIB) is an standard used by SNMP agents. Because most PLC devices implement a simple SNMP agent, a custom MIB is needed for Honey-PLC to provide a realistic SNMP simulation. Figure 1: The Architecture of HoneyPLC. Before deployment, a PLC Profile is selected from a repository (1). Later, at runtime, an attacker may initiate contact via a dedicated protocol, e.g., S7comm (2). Communications are then processed by the Personality Engine (3), later forwarded to the S7comm Server (5), and are eventually logged by the Interaction Data Framework (6). Finally, all code injected by the attacker is captured within the repository module (7). • Nmap Fingerprint. A plain text file with the Nmap fingerprint to effectively simulate the TCP/IP Stack of a particular PLC device. As it will be detailed later in this Section, this fingerprint allows HoneyPLC to effectively engage and deceive well-known reconnaissance tools such as Nmap. • Management Website. Some PLC devices provide a light webserver with a splash screen and some configuration options. Because of this, a PLC Profile includes a copy of the such website, including, but not limited to, image, HTML and CSS files. PLC Profiler Tool. The HoneyPLC Profiler Tool automates the creation of new HoneyPLC Profiles. It interfaces with three different applications: Nmap, (Sec. 2.2), snmpwalk [30], and wget [31]. To obtain the profile for a target PLC, the HoneyPLC Profiler requires the IP address of the PLC device as the only input. Then, the Profiler runs a series of queries to obtain the three discrete sets of data from the target PLC described before: an SNMP MIB, a website directory, and an Nmap fingerprint. First, snmpwalk is used for reading all the available Object IDs (OIDs) from the public community string, creating an identical MIB to the one used by the PLC. OIDs may include, among other important configuration settings, the unique identifier of the PLC, as well as its base IP address. Second, Nmap's OS detection is used to get the TCP/IP stack fingerprint of the target PLC, in a process that includes scanning all well-known TCP and UDP ports. This fingerprint will be later leveraged by HoneyPLC's Integration Framework to provide meaningful TCP/IP interactions
Figure 2: The HoneyPLC Personality Engine: First, a PLC Profile is selected from the repository, including its Nmap Fingerprint (1). When an attacker tries to fingerprint HoneyPLC using Nmap, such a tool will send a series of Probes to determine the OS or Device (2). HoneyPLC will then reply with appropriately crafted responses that simulate a real PLC, thus effectively deceiving Nmap and the attacker (3). as a response to requests initiated by an attacker. Third, wget is used to download a complete copy of the splash screen or administration website, if any. Finally, the HoneyPLC Profiler will create a custom directory that can be used by HoneyPLC, inside its dedicated PLC Profile Repository, shown in Fig. 1 (1), to simulate the target PLC. 4.3 Supporting Operational Covertness TCP/IP Simulation. Within HoneyPLC's Integration Framework, depicted in Fig. 1, a sophisticated TCP/IP Stack simulation is implemented by leveraging Honeyd [28], a popular framework for honeypot simulation, as well as Nmap, discussed in Sec. 2.2. The process is depicted in Fig. 2. Initially, when a new PLC is to be modeled by HoneyPLC, Nmap is used to generate a detailed TCP/IP Stack fingerprint for it. Next, such a fingerprint is integrated with the Honeyd fingerprint database, by appending it to Honeyd's nmap-os-db text file. Later, at runtime, when a tool like Nmap tries to fingerprint a HoneyPLC host, HoneyPLC Personality Engine, leveraging Honeyd, will respond with the appropriate fingerprint information. To achieve this, the Engine reads a particular fingerprint from Nmap's database and reverses it, which means that when Honeyd simulates a particular device, it introduces its IP/TCP Stack peculiarities: TCP SYN packet flags, IMCP packet flags, and timestamps. The generation of accurate Nmap fingerprints imposed a variety of challenges. First, PLC devices of different manufacturers and models use different UDP and TCP ports that are not standard, or may not be properly defined within the device manuals, e.g., port 2222 for the MicroLogix 1100 PLC. The lack of heterogeneity required us to perform a manual inspection, which was time-consuming and error-prone. Second, we analyzed the Nmap reports that contain the fingerprint results and modified the format to be compatible with the Honeyd fingerprint database. Additionally, the creation of accurate Honeyd templates brought its own set of challenges. For HoneyPLC to provide enhanced interaction capabilities, which can engage attackers for extended periods of time (as we further Figure 3: The HoneyPLC SNMP and the Webserver Agents. The MIB database as well as the Website HTTP files, obtained from a PLC Profile, are first loaded by each agent (1). Then, the Attacker may use SNMPWalk as well as an HTTTP Client to established connection with HoneyPLC (2). Later, each agent with reply to each request using the information obtained from the PLC Profile (3). describe in Sec. 4.3), we significantly improved the standard simulation scripts included within Honeyd. Specifically, we used the subsystem virtualization feature provided by Honeyd: this feature facilitates the integration of the different HoneyPLC components. S7comm Server. Within HoneyPLC's Network Services Module, depicted in Fig. 1, the S7comm server provides a sophisticated simulation of the Siemens proprietary protocol. It simulates a real Siemens PLC and exposes several memory blocks via TCP port 102. At the time of writing this paper, Siemens had not released the specifications of S7comm protocol and the information that is available has been collected by third parties like the Snap7 project [27] and the Wireshark Wiki [1]. We leveraged the Snap7 framework [27, 32] to write an S7comm server application in C++. We modified and recompiled the source code of the main Snap7 library to add our own features. These include logging the S7comm interactions, ladder logic capture, and PLC firmware specifications for all three Siemens PLC models. For example, CPU model, serial number, PLC name label, copyright among others. SNMP Server. Within HoneyPLC's Network Services Module, the SNMP Agent implements an advanced simulation of the SNMP protocol along with believable MIB data, effectively allowing HoneyPLC to reply to any external SNMP server query. SNMP is commonly used in practice to monitor network connected devices and listens to requests over UDP port 161. Since real PLCs do implement SNMP agents, implementing this sub-component adds to the deception capabilities of HoneyPLC. Our simulation process, shown in Fig. 3 (top), can be described as follows: In practice, a typical SNMP setup includes a Manager as well as an Agent module. The SNMP Manager continually queries the Agent for up-to-date data. A SNMP Agent exposes a set of data known as Management Information base or MIB. In order to simulate the SNMP protocol, we use the light Python application snmpsim, which simulates a SNMP Agent based on real time or archived MIB data. When a SNMP request is received by HoneyPLC, the SNMP Agent replies with an OID as a real PLC would do.
HTTP Server. Finally, the HoneyPLC's HTTP server provides an advanced simulation of the HTTP server of the Real PLCs and serves websites found in real PLCs, as illustrated in Fig. 3 (bottom). As an example, most Siemens PLC devices include an optional HTTP service to manage some of its internal configuration features. This functionality was in turn implemented with lighttpd [18], a lightweight webserver to handle all HTTP quests. When an HTTP request hits HoneyPLC, its Integration Framework relays the request to the lighttpd server. Later, the webserver replies with the website data from a HoneyPLC profile. 4.4 Ladder Logic Collection HoneyPLC's S7comm Server holds the novel Ladder Logic Capture feature. It writes any ladder logic program that an attacker uploads to HoneyPLC. When an adversary uploads a ladder logic program to any of the S7comm Server memory blocks, while trusting it to be a real PLC, this feature automatically writes them into the file HoneyPLC filesystem with the corresponding timestamp. These captured ladder logic programs can be analyzed at a later stage at the byte level to expose ladder logic instructions and then extract new attack patterns used by adversaries targeting PLCs. We implemented the Ladder Logic Capture component leveraging the Snap7 framework using C++, in a similar fashion as the S7comm Server. Additionally, we modified the Snap7 framework main library files to integrate this feature at the Linux OS level. 4.5 Implementing Record Keeping via Logging The Interaction Data component holds all of the interaction data gathered by HoneyPLC. It maintains two kinds of data. First, it contains all logs produced by our S7comm servers, the SNMP agent and the HTTP server. Second, it contains all the ladder logic programs that get injected via the S7comm server. This component communicates directly with the Network Services component. We configured Honeyd, lighttpd, snmpsim and the S7comm Server to automatically log all interactions. The S7comm Server writes to the file system all interactions including IP address of originating host, timestamp, memory block ID in the case of reading or writing. Next, snmpsim logs IP information, what OIDs were accessed and timestamps. Finally, the lighttpd webserver includes all the major features of a modern webserver with detailed logging that includes IP address information, accesses website files and timestamps. All of them log every interaction all the time. 5 EVALUATION As shown throughout Sec. 4, HoneyPLC is designed to effectively deceive attackers into believing that they are dealing with real PLCs. This section starts by enumerating a set of experimental questions, which are based on the limitations of existing approaches as presented in Sec. 3. Then we present a series of experiments designed to provide affirmative answers to each question backed up by experimental evidence. For this purpose, we used the following PLC models: Siemens S7-300, S7-1200, and S7-1500, as well as the Allen-Bradley MicroLogix 1100 and the ABB PM554-TP-ETH, which are shown in Fig. 4, as these models are common in practice. As an Table 2: Experimental Comparison of PLC Honeypots. Keys: × = No Coverage; 1/2 = Limited Coverage; = Optimal Coverage. Expr. Conpot [15] SCADA Honey-Net [3] Gaspot [39] S7 comm-Trace [40] Honey-PLC Nmap (Sec. 5.4) 1/2 1/2 1/2 PLCScan (Sec. 5.4) 1/2 N/A 1/2 Honeyscore (Sec. 5.5) × × × Step 7 Manager (Sec. 5.6) × × N/A × PLCinject (Sec. 5.8) × 1/2 × × Figure 4: PLCs procured for experimental purposes including, from left to right, Siemens S7-300, S7-1500, S7-1200, Allen-Bradley MicroLogix 1100, and ABB PM554-TP-ETH. example, a query2 on Shodan [23], shows more than a 1,700 Internetfacing PLCs across several different countries. For each experiment, we describe its environmental setup, the methodologies used, and the results obtained. Table 2 shows a summary of the experiments we performed comparing HoneyPLC with other honeypots in the literature whose implementation were either available online or were obtained from their authors upon request. A description of the obtained results is provided next, and an extended discussion comparing HoneyPLC with related work is shown in Sec. 6. 5.1 Experimental Questions Q-1 Can HoneyPLC support different real PLCs? Since current approaches provided limited support for various types of PLCs being widely used by ICS in practice, we were interested in exploring the capabilities of HoneyPLC to model different PLCs using the PLC Profiler Tool described in Sec. 4.2. This question is related to Limitation L-1, as discussed in Sec. 3. We strive to answer to this question in Sec. 5.2 and 5.3. Q-2 Can HoneyPLC conceal its honeypot nature from attackers? 2https://www.shodan.io/search?query=siemens+port%3A102.
More specifically, can HoneyPLC fool widely-used reconnaissance tools? Also, we were interested in obtaining evidence regarding the interactions HoneyPLC may have obtained when deployed in the wild, i.e., via an Internet connection. This question is related to Limitations L-2 and L-3. We elaborate on this question in Sec. 5.4, Sec. 5.5, and Sec. 5.7. Q-3 Can HoneyPLC effectively capture Ladder Logic code? Since capturing Ladder Logic code represents a highly desirable feature for analyzing threats to ICS, we were interested in exploring the capabilities of HoneyPLC, as described in Sec. 4, to properly carry out such task. This question is related to Limitation L-4 and is addressed in Sec. 5.8. 5.2 Case Study: Profiling Siemens PLCs As mentioned in Sec. 3, current state-of-the-art honeypots for PLCs have been modeled over a limited number of PLCs, as shown in Table 3, and support for any extensions is quite limited. Therefore, we were interested in exploring the capabilities of HoneyPLC to support PLCs of different models and manufacturers. Environment Description. For our first case study, we procured three Siemens PLCs: the S7-300, the S7-1200 and the S7-1500 models, which are shown in Fig. 4. Each PLC was connected to a special power supply and data or Ethernet cables. Additionally, we used the Siemens Step7 Manager, tools to configure IP addressing. We also deployed the HoneyPLC Profiler Tool and Python 3 in a laptop host where we connected our PLCs. Methodology. We connected each PLC model to our experimental laptop host and used our command-line-based HoneyPLC Profiler Tool to create the PLC Profiles for the three PLCs. To launch the tool, we input the PLC IP address and the name of PLC Profile directory. While the HoneyPLC Profiler Tool starts querying data from the PLC progress messages are shown including error messages, if any. We encountered some difficulties while developing and testing the Profiler Tool. First, we had to expand the number of ports scanned to obtain a better Nmap fingerprint, so that Nmap reports it with a higher confidence. We also had to make adjustments to download the PLC websites to include images and correct HTML paths. Also, it was necessary to manually modify the PLC profile HTML files to correct broken links. Results. Overall, we were successful in creating all three PLC profiles. These profiles were saved in our experimental laptop host file system and were later used in the other experiments depicted in this Section. The HoneyPLC Profiler Tool took approximately 5 minutes to create each profile and we only had to make some small manual modifications to some HTML files, as mentioned before. For PLCs produced by Siemens, the retrieval of their corresponding profiles may be facilitated if the SNMP and the web server services are properly activated beforehand by following the instructions provided by the manufacturer, or by using any other S7commenabled software, e.g., the Step7 Manager. Failure to perform this step may result in the creation of an incomplete profile. 5.3 Case Study: Allen-Bradley and ABB PLCs Additionally, we were interested in exploring the capabilities of HoneyPLC to support PLCs manufacturers other than Siemens, so we can provide some general recommendations for practitioners interested in obtaining additional PLC profiles. Table 3: PLC Devices Supported by ICS Honeypots. Approach Supported PLC Devices Gaspot [39] Veeder Root Guardian AST SCADA HoneyNet [3] Siemens CP 343-1 Conpot [15] Siemens S7-200, Allen Bradley LOGIX5561 Digital Bond's Honeynet [38] Modicon Quantum PLC DiPot [10] Siemens S7-200 SHaPe [19] IEC 61850-Compliant PLC CryPLH [9] Siemens S7-300 S7commTrace [40] Siemens S7-300 Antonioli et al. [7] Generic PLC HoneyPhy [21] Generic Analog Thermostat HoneyPLC Siemens S7-300, S7-1200, S7-1500 Allen-Bradley MicroLogix 1100 ABB PM554-TP-ETH Environmental Description. For this case study, we procured the Allen-Bradley MicroLogix 1100 and the ABB PM554-TP-ETH PLCs, which are shown in Fig. 4. Additionally, we used Allen-Bradley and ABB software tools to configure their IP addresses. Methodology. As with our previous case study, we deployed the HoneyPLC Profiler Tool and Python 3 in a laptop host, and connected each PLC to a special power supply. Also, we connected each PLC model to our experimental laptop host and used our command-line-based HoneyPLC Profiler Tool as before. Results. We successfully produced a profile for each of the PLCs under analysis, and obtained the following recommendations to practitioners. First, for non-Siemens PLCs it may become necessary to identify the network services they provide, as different vendors may implement a variety of protocols on different ports. As an example, the Allen-Bradley MicroLogix 1100 PLC uses port 80 to implement a light web server, similar to Siemens PLCs, whereas such a feature is not implemented by the ABB PM554-TP-ETH. Second, both Non-Siemens PLCs under study also fail to support the SNMP service, which prevents the HoneyPLC Profiler Tool from retrieving a MIB database. Third, the Allen-Bradley MicroLogix 1100 PLC implements the industry standard EtherNet/IP protocol on port 2222 for configuration purposes, which differs from Siemens models that use the proprietary S7comm protocol. These differences may ultimately result in PLC Profiles that are different from the ones obtained for Siemens PLCs, and may need to be subsequently addressed on a case-by-case basis. Fourth, whereas the Siemens PLCs use the proprietary S7comm protocol for loading Ladder Logic programs, the Allen-Bradley MicroLogix 1100 uses the EtherNet/IP protocol. In such regard, the ABB PM554-TP-ETH PLC uses the Nucleus Sand Database, which is mostly used for database record keeping, and whose use in PLC devices is not customary. Because both protocols are not currently supported by HoneyPLC, additional modifications may be required. For example, for the M554-TP-ETH PLC Profile we modified the Honeyd template to open port 1201 as a Nucleus Sand DB simulation that can be used through the subsystem virtualization is not currently supported. For the MicroLogix 1100 PLC Profile, we modified the
90 92 94 96 98 100 S7-1500 (Real) S7-1500 (HoneyPLC) S7-300 (Real) S7-300 (HoneyPLC) S7-1200 (Real) S7-1200 (HoneyPLC) Nmap Confidence (%) Figure 5: Nmap Scan Results. All three Profiles obtained at least a 90% confidence rate. The S7-300 and S7-1200 Profile obtained rates comparable with their real counterparts. Profiler Tool port scan range to include not only well-know ports but also registered ports such as port 2222. Finally, Table 3 provides a comparison of the PLC models supported out-of-the-box by related honeypots for ICS, which were also shown in Table 1. The positive results obtained in our two case studies give support to answer Q-1 in the affirmative. 5.4 Resilience to Reconnaissance Experiment The moment the true nature of HoneyPLC (or any other honeypot) is revealed to an attacker, the quantity and value of the gathered interaction data may significantly decrease. Therefore, we aimed to test the resilience of HoneyPLC to Nmap and PLCScan, described in Sec. 2, which are well-known tools for reconnaissance. Additionally, we tested how existing honeypots, namely Gaspot [39], S7commTrace [40], SCADA HoneyNet [3] and Conpot [26], perform in this regard. Environment Description. Our experimental setup was composed of two physical computers: a desktop and a laptop host. The desktop host featured Ubuntu 18.04 LTS along with HoneyPLC, as well as the following tools: Honeyd, lighttpd, snmpsim, and S7comm server. We built Honeyd version 1.6d from source, the latest version is available in the official GitHub repository [2]. Also, we installed the lighttpd web server version 1.4.45. Next, we installed snmpsim version 0.4.7 and all its dependencies. Finally, we installed our S7comm server and our custom library. Conversely, the laptop host included the latest version of Nmap 7.80 as well as the three Siemens PLCs fingerprints in Nmap's fingerprint database nmap-os-db that were obtained as a result of the previous experiment. Additionally we installed the latest version of PLCScan obtained from GitHub [33]. Both hosts were directly connected via an Ethernet cable. Subsequently we downloaded and deployed the related honeypots mentioned before, and connected them to the scanning host so that all of them would be in the local network. Methodology. To create a baseline to compare the results of our experiments, the Nmap confidence data of the real PLCs featured in the previous experiment was obtained. With that in mind, a 0 0.1 0.2 0.3 0.4 0.5 S7-300 (HoneyPLC) S7-300 (Shodan Real) S7-300 (Shodan Honeypot) S7-1200 (HoneyPLC) S7-1200 (Shodan Real) S7-1500 (HoneyPLC) Honeyscore (0.0 - 1.0) Figure 6: Shodan Honeyscore Results. Our HoneyPLC PLC Profiles perform better than other honeypots found in Shodan and at the same level as real PLCs. second test environment was composed of an additional host with Ubuntu 18.04 LTS and Nmap 7.80. Later, the additional host was directly connected to one of the three different PLCs (S7-300, S7- 1200 and S7-1500) using an Ethernet cable. We installed the Step7 Manager in order to configure the network settings of the PLCs. Next, two different sets of Nmap scans were conducted with OS detection enabled. One set for HoneyPLC and another set for the real PLCs. Each PLC model was scanned 10 times. For the Honey-PLC experiment the corresponding HoneyPLC Profile was installed so that the aforementioned applications were correctly configured. Next, we used PLCScan to scan each PLC Profile in similar fashion as the Nmap methodology. Afterwards we turned to Gaspot, S7commTrace, SCADA HoneyNet and Conpot. Each honeypot was scanned with Nmap's OS detection enabled 10 times. Finally, we used PLCScan on S7commTrace, SCADA HoneyNet and Conpot. Gaspot was omitted as it does not support the S7comm protocol. Results. The results of our Nmap experiment can be seen in Fig. 5, and show that for all three PLC models, the real PLCs gets the best confidence by a small margin. However, our PLC Profiles as provided by HoneyPLC were really close behind, thus providing positive evidence that our approach can provide effective covertness, as required by our question Q-2. These results are encouraging since for all scans across all sets Nmap identified the correct PLC model with the highest confidence. Our PLCScan experiments were also successful, as we were able to obtain and provide real PLC data using PLCScan against HoneyPLC for all three PLC Profiles. In addition, SCADA HoneyNet was identified as a Siemens CP 343-1 PLC, however, Gaspot, S7commTrace and Conpot were fingerprinted as Linux OS with a 100% confidence, with no mention of any PLC device. Regarding PLCScan, Conpot was identified as a S7-200 PLC and SCADA HoneyNet and S7commTrace provided connection information but displayed an empty PLCScan report. Our results are even more significant due to the fact a Linux kernel simulation of the TCP/IP Stack, as implemented by several related approaches, including Gaspot and Conpot, will not deceive Nmap [9].
5.5 Shodan's Honeyscore Experiment As with the previous experiment, Shodan, described in Sec. 2.2, is actively leveraged in practice, along with its corresponding Shodan API to detect honeypots exposed to the Internet with a high degree of accuracy. Therefore, we were interested in the capabilities of HoneyPLC to deal with this state-of-the-art reconnaissance tool. Environment Description. For this experiment, we deployed three AWS EC2 instances accessible from the Internet with the following specifications: 2 vCPUs, 4GB RAM and Ubuntu 18.04 LTS OS, exposing TCP ports 80 and 102 and UDP port 161. Then we deployed HoneyPLC on each one of them featuring all of our three PLC profiles, following the configuration steps detailed in the previous experiment. We also deployed four additional AWS instances hosting Conpot, Gaspot, S7commTrace and SCADA HoneyNet. Methodology. We obtained the Shodan Honeyscores, whose methodology is described in Sec. 2.2, of each of our HoneyPLC PLC Profiles, other honeypots for the same PLC models that were publicly exposed to the Internet and Gaspot, Conpot, S7commTrace and SCADA HoneyNet. For such a purpose, we leveraged Shodan to gather data of Internet-facing real PLCs and PLCs flagged as honeypots. We looked at open ports, geolocation, Honeyscore, PLC model and IP addresses. Later, we compared these data to the one obtained for our HoneyPLC PLC Profiles. Once deployed to the Internet, it took about a week for Shodan to index our honeypots and identify the S7comm and HTTP services on ports 102 and 80. Results. The results of our Shodan experiment, depicted in Figure 6, show that Shodan assigns a Honeyscore of 0.0 to our S7-300 profile and how this Honeyscore compares to real S7-300 PLCs and other S7-300 honeypots found in the wild by. Moreover, our S7-1200 and S7-1500 profiles got a 0.3 Honeyscore, which is comparable with the one obtained by real S7-1200 PLCs as indexed by Shodan. Unfortunately, at the time this experiment was performed, we were not able to find any S7-1200 honeypots in Shodan for comparison. Regarding the other four AWS instances, S7commTrace, Gaspot, and SCADA HoneyNet were not indexed by Shodan as they crashed when Shodan's crawler tried to interact with them. Thus they could not be assigned a Honeyscore. Conpot, however, was successfully indexed and was assigned a 0.3 Honeyscore. Overall, these results add compelling evidence with respect to Question Q-2, showing that HoneyPLC is effective at maintaining covertness against state-of-the-art reconnaissance tools. 5.6 Step7 Manager Experiment We designed an experiment to test the capabilities of the HoneyPLC S7Comm Server, discussed in Sec. 4.3, against Step7 Manager [8], a Siemens proprietary software used to configure as well as to write and upload ladder logic programs to PLCs. For comparison purposes, we attempted to perform the same experiment on Conpot, the SCADA HoneyNet, and S7commTrace, which claim support for the S7comm protocol, as shown in Table 2. Environment Description. For this experiment, we used a Windows XP virtual environment installed on a desktop host. Additionally, we installed HoneyPLC, the related work honeypots shown in Table 2, and all three Siemens PLC Profiles in different Ubuntu 18 LTS VMs and connected them to the Windows XP host. Table 4: Comparison of S7comm function codes. S7comm Implementation Functions Subfunctions HoneyPLC 13 18 S7commTrace 12 14 Methodology. To test the compatibility of a particular honeypot with Step7 Manager, we performed the following: First, we attempted a direct, initial connection to the tool by using the 'Go Online' GUI feature. Second, we used Step7 Manager to list all the memory blocks contained within a given honeypot. Third, we also tried to upload a memory block to each honeypot, and finally, in a reciprocal action, we tried to download the contents of a memory block, which was previously-stored by each honeypot under test. Results. Our results show that HoneyPLC is the only implementation capable of handling all of the functionality previously mentioned, as is shown in Table 2. Conpot, S7commTrace, and SCADA HoneyNet were able to establish the initial connection, however, the Step7 Manager threw a connection timeout error, preventing any further interaction and resulting in an aborted execution. Moreover, as S7commTrace is a high-interaction honeypot that implements features similar to the ones provided by HoneyPLC's S7comm Server, we strove to provide an extended comparison between them. The HoneyPLC S7comm Server improves over S7comm-Trace by providing more functions and subfunctions as shown in Table 4. Specifically, it adds an error response function and insert block, delete block, blink LED ,and cancel password subfunctions. The error response function and the delete and insert block functions, in particular, are important when injecting ladder logic programs and connecting with Step7 Manager. Overall, besides providing compatibility with Step7 Manager, HoneyPLC also provides enhanced capabilities for capturing ladder logic, e.g., reading and writing memory blocks, which are not supported by S7commTrace. 5.7 Internet Interaction Experiment In order to explore the capabilities of HoneyPLC to interact with external, non-controlled agents, e.g., attackers, we designed an experiment intended to expose the PLC Profiles discussed in previous experiments to remote connections via Internet. Environment Description. We leveraged the environmental setup we designed for our previous Shodan-based experiment in Sec. 5.5. Also, we used the same AWS EC2 instances equipped with PLC Profiles for the S7-300, S7-1200, and S7-1500 PLCs. Methodology. We exposed the EC2 instances to the Internet for a period of 5 months. Using the HoneyPLC logging capabilities discussed in Sec. 4.5, we logged all received interactions. Results. As a result of this experiment, more than 5GB of data were recorded. Table 5 shows the different S7comm function commands received by each PLC Profile. The fact that we recorded these functions means that external agents interacted with HoneyPLC beyond a simple connection performing reconnaissance tasks. Additionally, we received 4 PLC Stop functions on our S7-300 Profile, which stops the current ladder logic program execution, suggesting that external agents tried to disrupt the PLCs' operation. Table 6 shows that our honeypots also received thousands of HTTP conversations and logged multiple HTTP authentication
Table 5: S7comm Function Commands Received. PLC Profile Setup Communication Read SZL PLC Stop List Blocks S7-300 600 1013 4 80 S7-1200 202 324 0 0 S7-1500 292 343 0 0 Table 6: HTTP and SNMP Interactions Received. PLC Profile HTTP Conversations HTTP Login Attempts SNMP Get Requests S7-300 2060 205 1925 S7-1200 1791 30 567 S7-1500 13 0 1271 attempts on their administration websites, including the usernames and passwords used by the external parties. Additionally we also recorded thousands of SNMP get requests that downloaded our PLC Profile's MIBs several times. Table 7 shows the distribution of S7comm connections based on geographical location. It can be noted that countries with most connections have historically been either the target or the initiators of attacks against ICS [13] recorded in the literature. Finally, at the time of writing this paper, no attempts to inject malicious ladder logic into our honeypots were recorded. Such an attack would have been signaled by an attempt to write a memory block inside a PLC. Despite this limitation, the amount and nature of the interactions obtained provide additional support for affirmatively answering Question Q-2, showing that HoneyPLC can effectively engage external agents and tools. 5.8 Ladder Logic Capture Experiment Finally, we were interested in exploring the capabilities of Honey-PLC to properly collect Ladder Logic malware that is injected by attackers, following the Case Scenario described in Sec. 4.1. Environment Description. For this experiment, we leveraged the same HoneyPLC AWS test environment described in Sec. 5.5 for our Shodan experiment. Additionally, we locally deployed Conpot, Gaspot, S7commTrace and SCADA HoneyNet. We faced some problems deploying the SCADA HoneyNet as it is currently not maintained at all (the latest version was released in 2004), however, we were able to deploy the S7comm portion of the honeypot, enabling us to conduct this experiment. To test our implementation, we employed PLCinject [17], a research tool published by the SCADACS team, which is capable of injecting arbitrary compiled ladder logic programs into a PLC memory block. We also set up a laptop host with Ubuntu 18.04 LTS installed with the latest version of PLCinject available on Github [4]. Since PLCinject also leverages the Snap7 framework, we installed a custom library and compiled PLCinject from source. We also used the Windows XP host described in Sec. 5.6 with Step7 Manager. Methodology. Fig. 7 illustrates our setup and methodology. The PLCinject host contains the ladder logic program sample that PL-Cinject will upload into HoneyPLC, which resides inside an AWS Figure 7: Capturing Ladder Logic: Initially, the attacker selects a malicious program and leverages PLCinject (1), which then establishes communication with an AWS instance running HoneyPLC (2). Malicious code is injected into a previously-selected memory block exposed by the S7comm server (3), and finally written into a file repository (4). Table 7: S7comm Connections Received by Geolocation. Geolocation S7- 300 S7- 1200 S7- 1500 Geolocation S7- 300 S7- 1200 S7- 1500 United States 359 142 250 Netherlands 22 13 11 United Kingdom 2 1 3 Japan 8 2 2 Turkey 2 0 1 Italy 1 0 0 Switzerland 3 1 1 Iceland 1 1 2 Sweden 1 1 1 Hong Kong 2 1 1 South Korea 1 0 0 Germany 18 9 12 Slovakia 0 1 1 France 10 5 7 Singapore 4 3 5 Denmark 1 1 0 Russia 28 12 14 China 42 16 26 Romania 6 2 4 Canada 3 2 3 Poland 1 0 0 Bulgaria 2 1 0 Panama 2 1 3 Belize 3 3 3 instance exposing a set of standard PLC memory blocks. We leveraged the capabilities of PLCinject to connect and interact with the HoneyPLC host, eventually injecting the desired Ladder Logic program by using the command line. Later, using the Step7 Manager GUI we created a new project and wrote a sample ladder logic to be injected into HoneyPLC. Next, we used the Step7 Manager to list the available memory blocks and then use the upload function to inject the sample ladder logic program into HoneyPLC. Later, we conducted another set of experiments focused on Gaspot, Conpot, S7commTrace, and the SCADA HoneyNet. We configured each of the honeypots with the correct IP addresses and ports, and used PLCinject and the Step7 Manager to write the sample program into them, following the same process used for HoneyPLC.
Results. Our experiments were successful as we were able to inject a sample ladder logic program into HoneyPLC using both, PL-Cinject and the Step7 Manager. After the injection was completed, we logged into our honeypot file system and found the ladder logic file with its corresponding timestamp, which matched the contents of the blocks previously updated to PLCinject, as described in the previous paragraph. More to the point, after the Step7 Manager injection was completed we downloaded our own sample program from HoneyPLC's S7comm server and used the ladder logic editor (included with Step7 Manager) to corroborate that our sample program was in fact saved in HoneyPLC's S7comm server. It is worth mentioning that the Step7 Manager did not crash or threw any errors while interacting with HoneyPLC's S7comm server. This adds evidence as to the level of interaction that HoneyPLC provides. Regarding the Gaspot honeypot, our results show that it is not possible to inject any program into it. In fact, the TCP connection times out, and there is no reply. The results from Conpot show that it can, in fact, open a connection to TCP port 102, however, it is reset, and the program upload cannot continue. S7commTrace results in the S7comm connection not being established. Finally, the S7comm portion of the SCADA HoneyNet accepts the TCP port 102 connection and starts the upload function needed to upload the ladder logic program, however, after the upload function ends, there is no data saved or even transmitted. Overall, these results provide compelling evidence for answering Question Q-3 affirmatively. 6 DISCUSSION AND FUTURE WORK Comparing HoneyPLC with Previous Approaches. Following the comparison shown in Table 1, HoneyPLC provides significant improvements over the current state-of-the-art of honeypots for PLCs. First, HoneyPLC provides better covertness capabilities than the ones provided by related works in the literature, as shown in the experimental procedures summarized in Table 2. Moreover, as detailed in Sec. 4.3, HoneyPLC provides advanced TCP/IP simulation based on Honeyd, plus the careful simulation of different domainspecific protocols. Whereas the simulation of various protocols is shared by many approaches in the literature, only HoneyPLC and SCADA HoneyNet [3] leverage the rich simulation features provided by the Honeyd framework. Second, the extensibility features of HoneyPLC, discussed in Sec. 4.2, allow for the effective simulation of different PLCs deployed in practice, as it was shown in the experimental procedures detailed in Sec. 5.2. Such a feature is not shared by any other approach in the literature, as shown in Table 1. Only a few approaches provide limited extensibility features, but those are mostly based on manually changing some configuration settings for the PLCs they support. As shown in Sec. 4.2, the HoneyPLC's Profiler Tool supports the collection and configuration settings for different real PLCs, which may allow for practitioners to create and distribute PLC Profiles for HoneyPLC for many different brands and models used in practice. Finally, HoneyPLC's Ladder Logic Capture feature is optimal for the understanding and analysis of malicious programs tailored for PLCs, which is not provided by any other related work, as shown in Table 2. Limitations. Despite the innovative features of HoneyPLC, and the promising evaluation results shown in Sec. 5, we identified the following limitations to our approach. First, as shown in Table 1, HoneyPLC does not provide support for modeling physical interactions as depicted by PLCs in practice. To solve this, future versions of HoneyPLC may be enhanced with a generic, general-purpose framework that facilitates the collection and subsequent modeling of physical interactions that can further engage and deceive attackers. Second, despite numerous attempts, we were unable to test HoneyPLC against Stuxnet, shown in Sec. 2.3, up to the point in which PLCs are injected with Ladder Logic code. This problem was also encountered by seasoned partners in industry, as it was revealed to us in private conversations. As an alternative, we strove to replicate a similar code injection scenario as shown in Sec. 5.8. Future Work. First, we plan to add support to other ICS specific network protocols such as Modbus, which is widely implemented by other approaches in the literature. Second, we plan to use HoneyPLC as a basis for simulating rich ICS infrastructures completely in software, modeling components like SCADA and other devices. Current ICS are proprietary, closed, and composed of a plethora of costly devices, which clearly complicates the effective development and testing of new protection tools by researchers. Finally, we plan to turn HoneyPLC into a comprehensive suite for malware analysis for ICS by incorporating Ladder Logic analysis tools such as ICSREF [16], as well as other works such as PLCinject, featured in Sec. 5.8. 7 CONCLUSIONS Attacks targeting ICS are now more real than ever and their consequences may be catastrophic. In this paper we have introduced HoneyPLC, a convenient and flexible honeypot, which significantly pushes the state-of-the-art of the field forward. Additionally, we have provided experimental evidence that demonstrates that HoneyPLC outperforms existing honeypots in the literature, achieving a performance level comparable to real PLC devices. Finally, the HoneyPLC advanced extensibility features may allow for practitioners to create and openly distribute many new PLC Profiles for a variety of PLCs used in practice, thus positioning HoneyPLC not only as a helpful tool for preventing and deterring ongoing attacks, but also as the starting point for designing and evaluating new protection technologies for mission-critical cyber-physical systems and infrastructure. ACKNOWLEDGMENTS We would like to express our gratitude to the anonymous reviewers for their thoughtful feedback. This work was supported in part by the National Science Foundation (NSF) under grant 1651661, the Department of Energy (DoE) under grant DE-OE0000780, the Army Research Office under grant W911NF-17-1-0370, the Defense Advanced Research Projects Agency (DARPA) under the agreements HR001118C0060 and FA875019C0003, the Institute for Information & communications Technology Promotion (IITP) under grant 2017- 0-00168 funded by the Korea government (MSIT), and by a grant from the Center for Cybersecurity and Digital Forensics (CDF) at Arizona State University. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Government or any agency thereof.
REFERENCES [1] 2016. S7comm - The Wireshark Wiki. https://wiki.wireshark.org/S7comm. [2] 2020. DataSoft/Honeyd.https://github.com/DataSoft/Honeyd original-date: 2011-12-09T22:40:03Z. [3] 2020. SCADA HoneyNet Project: Building Honeypots for Industrial Networks. http://scadahoneynet.sourceforge.net/ [4] 2020. SCADACS/PLCinject.https://github.com/SCADACS/PLCinject originaldate: 2015-07-13T09:38:19Z. [5] ABB. [n.d.]. PLC Automation. https://new.abb.com/plc. Accessed: 2020-02-24. [6] Allen-Bradley. [n.d.]. Programmable Controllers. https://ab.rockwellautomation. com/Programmable-Controllers. Accessed: 2020-02-24. [7] Daniele Antonioli, Anand Agrawal, and Nils Ole Tippenhauer. 2016. Towards high-interaction virtual ICS honeypots-in-a-box. In Proc. of the 2nd ACM Workshop on Cyber-Physical Systems Security and Privacy. 13–22. [8] Hans Berger. 2006. Automating with STEP7 in STL and SCL: programmable controllers Simatic S7-300/400. Publicis. [9] Dániel István Buza, Ferenc Juhász, György Miru, Márk Félegyházi, and Tamás Holczer. 2014. CryPLH: Protecting smart energy systems from targeted attacks with a PLC honeypot. In Int. Workshop on Smart Grid Security. Springer, 181–192. [10] Jianhong Cao, Wei Li, Jianjun Li, and Bo Li. 2017. Dipot: A distributed industrial honeypot system. In Int. Conference on Smart Computing and Communication. Springer, 300–309. [11] Defense Use Case. 2016. Analysis of the cyber attack on the Ukrainian power grid. Electricity Information Sharing and Analysis Center (E-ISAC) 388 (2016). [12] Nicolas Falliere, Liam O Murchu, and Eric Chien. 2011. W32. stuxnet dossier. White paper, Symantec Corp., Security Response 5, 6 (2011), 29. [13] Kevin E Hemsley, E Fisher, et al. 2018. History of industrial control system cyber incidents. Technical Report. Idaho National Lab.(INL), Idaho Falls, ID (United States). [14] Stephen Hilt. 2016. GasPot Released at Blackhat 2015. https://github.com/sjhilt/ GasPot. [15] Arthur Jicha, Mark Patton, and Hsinchun Chen. 2016. SCADA honeypots: An in-depth analysis of Conpot. In 2016 IEEE conference on intelligence and security informatics (ISI). IEEE, 196–198. [16] Anastasis Keliris and Michail Maniatakos. 2019. ICSREF: A Framework for Automated Reverse Engineering of Industrial Control Systems Binaries. In Network and Distributed System Security Symposium, (NDSS) (San Diego, California, USA). The Internet Society. [17] Johannes Klick, Stephan Lau, Daniel Marzin, Jan-Ole Malchow, and Volker Roth. 2015. Internet-facing PLCs-a new back orifice. Blackhat USA (2015), 22–26. [18] Jan Kneschke. 2020. Lighttpd-fly light. https://www.lighttpd.net/. [19] Kamil Kołtyś and Robert Gajewski. 2015. Shape: A honeypot for electric power substation. J. of Telecommunications and Information Technology 4 (2015), 37–43. [20] Ralph Langner. 2011. Stuxnet: Dissecting a cyberwarfare weapon. IEEE Security & Privacy 9, 3 (2011), 49–51. [21] Samuel Litchfield, David Formby, Jonathan Rogers, Sakis Meliopoulos, and Raheem Beyah. 2016. Rethinking the honeypot for cyber-physical systems. IEEE Internet Computing 20, 5 (2016), 9–17. [22] Gordon Fyodor Lyon. 2009. Nmap network scanning: The official Nmap project guide to network discovery and security scanning. Insecure. [23] John Matherly. 2015. Complete guide to Shodan. Shodan, LLC (2016-02-25) 1 (2015). [24] John Matherly. 2019. personal communication. [25] Iyatiti Mokube and Michele Adams. 2007. Honeypots: concepts, approaches, and challenges. In Proc. of the 45th annual southeast regional conference. 321–326. [26] MushMush. 2020. Conpot. https://github.com/mushorg/conpot. [27] Davide Nardella. 2018. Snap7. http://snap7.sourceforge.net/. [28] Niels Provos. 2003. Honeyd-a virtual honeypot daemon. In 10th DFN-CERT Workshop, Hamburg, Germany, Vol. 2. 4. [29] Niels Provos and Thorsten Holz. 2007. Virtual honeypots: from botnet tracking to intrusion detection. Pearson Education. [30] Ubuntu Manpage Repository. 2019. snmpwalk - retrieve a subtree of management values using SNMP GETNEXT requests. http://manpages.ubuntu.com/manpages/ bionic/man1/snmpwalk.1.html. [31] Ubuntu Manpage Repository. 2019. Wget - The non-interactive network downloader. http://manpages.ubuntu.com/manpages/disco/en/man1/wget.1.html. [32] SCADACS. 2017. Snap7. https://github.com/SCADACS/snap7. [33] Justin Searle. 2015. plcscan. https://github.com/meeas/plcscan. [34] Siemens. [n.d.]. The intelligent choice for your automation task: SIMATIC controllers. https://new.siemens.com/global/en/products/automation/systems/ industrial/plc.html. Accessed: 2020-02-24. [35] Joe Slowik. 2018. Anatomy of an Attack: Detecting and Defeating CRASHOVER-RIDE. VB2018, October (2018). [36] Keith Stouffer, Joe Falco, and Karen Scarfone. 2008. NIST Special Publication 800-82: Guide to Industrial Control Systems (ICS) Security. Gaithersburg, MD: National Institute of Standards and Technology (NIST) (2008). [37] Keith Stouffer, S Lightman, V Pillitteri, Marshall Abrams, and Adam Hahn. 2014. NIST special publication 800-82, revision 2: Guide to industrial control systems (ICS) security. National Institute of Standards and Technology (2014). [38] Susan Marie Wade. 2011. SCADA Honeynets: The attractiveness of honeypots as critical infrastructure security tools for the detection and analysis of advanced threats. (2011). [39] Kyle Wilhoit and Stephen Hilt. [n.d.]. The GasPot Experiment: Unexamined Perils in Using. ([n. d.]). [40] Feng Xiao, Enhong Chen, and Qiang Xu. 2017. S7commTrace: A High Interactive Honeypot for Industrial Control System Based on S7 Protocol. In Int. Conference on Information and Communications Security. Springer, 412–423.